Automated annotation of chemical names in the literature with tunable accuracy
نویسندگان
چکیده
BACKGROUND A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation. RESULTS An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems. CONCLUSIONS Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.
منابع مشابه
Tunable Machine Vision-Based Strategy for Automated Annotation of Chemical Databases
We present a tunable, machine vision-based strategy for automated annotation of virtual small molecule databases. The proposed strategy is based on the use of a machine vision-based tool for extracting structure diagrams in research articles and converting them into connection tables, a virtual "Chemical Expert" system for screening the converted structures based on the adjustable levels of est...
متن کاملA CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images
Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...
متن کاملCheNER: a tool for the identification of chemical entities and their classes in biomedical literature
BACKGROUND Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address t...
متن کاملDual-band, Dynamically Tunable Plasmonic Metamaterial Absorbers Based on Graphene for Terahertz Frequencies
In this paper, a compact plasmonic metamaterial absorber for terahertz frequencies is proposed and simulated. The absorber is based on metamaterial graphene structures, and benefits from dynamically controllable properties of graphene. Through patterning graphene layers, plasmonic resonances are tailored to provide a dual band as well as an improved bandwidth absorption. Unit cell of the design...
متن کاملبهبود صحت ابهامزدایی نام نویسنده با استفاده از خوشهبندی تجمّعی
Today, digital libraries are important academic resources including millions of citations and bibliographic essential information such as titles, author's names and location of publications. From the view of knowledge accumulation management, the ability to search fast, accurate, desired contents, has a great importance. The complexity and similarity in these resources cause many challenges and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 3 شماره
صفحات -
تاریخ انتشار 2011